home *** CD-ROM | disk | FTP | other *** search
- 91-06/Sarnoff.info
- From: herbt@apollo.sarnoff.com (Herbert H Taylor III)
- Subject: Virtual Reality, Volume Visualization and Video (LONG)
- Date: Wed, 05 Jun 91 11:27:20 EDT
-
-
-
-
- The recent series of postings on VR/Video following the exchange
- between myself and Chris Shaw have been most informative. I would also
- like to publicly thank Chris for asking some very tough questions. I
- can certainly take a "whipping" without losing my sense of humour -
- although I hope we can both avoid the use of the pejorative. It was
- unfortunate that the HDTV emphasis of previous posts defocused the
- more important topic of VR Architectures. Chris challenged the
- applicability of those ideas to VR and while I remain convinced that
- they will prove important they seem to represent a small conceptual
- detour. Likewise, a strong technical challenge was raised to our
- description of several 3D video based scenarios. Much of the technical
- criticism which followed was the result of our poor description of our
- ideas. Real-time 3D imagery alone will not be sufficient to construct
- and manage entirely "simulated" virtual worlds. We will not be able to
- look under the Kitchen table for the "Chewing Gum". (Of course if the
- CGI modeler doesn't have a "Chewing Gum" model, "it", to borrow a
- favorite colloquial expression of Chris', "ain't there either.")
- Whether these limitations preclude the use of 3D imagery as a useful
- interface component of VR remains a topic of further research...
-
- In our original speculative post of ca. 3/10/91, (responding to our
- moderators request for summaries of current research), we mused about
- our desire to explore what VR will be like in ten years. We admitted
- that the system we were using (aka the Princeton Engine) was not a
- general solution to VR processing. I hope, however, that we use
- Supercomputers as vehicles to explore otherwise impossible ideas.
- Machines such as the UNC PxPL5, the CM2 or the Princeton Engine are
- not "practical" single user architectures - at least not yet - but
- surely in not too many years we will have desktop massive parallelism
- with the potential for real-time interactive VR applications. This
- leads naturally to motivating questions: will the "ultimate" VR system
- of the year 2000 be more SGI-MIMD like - with a large number of
- powerful processors in the rendering pipeline - or, will it be a
- hybrid of SIMD and MIMD, as in the PxPL5? Perhaps VR specific
- architectures will emerge. What impact does our choice of VR world
- model have on architecture? Is the evolution of VR only going to be in
- the direction of increasingly realistic CGI rendering? Are worlds
- derived from sampled data going to become more viable?
-
- What are the applications which motivate future VR architectures?
- Certainly, the potential for applications in medicine, architecture,
- simulated experience, "gaming" and product design will continue to
- provide motivation for developing systems with improved visual realism
- and more natural interactions. Likewise, scientific data visualization
- offers fertile ground for VR research and future application - where
- we have this notion of interacting with and literally "experiencing"
- our data. There has been significant independent progress in recent
- years in each of the fields of interactive data visualization, VR and
- specifically, Volume Visualization (VV), however, it will be the
- convergence of all three technologies into a single computing and
- interaction framework where the true enabling leap of functionality
- will occur. Scientists will be able to simulate and visualize complex
- phenomena, in some sense, actually "participate" in their experiments.
- This kind of interaction will revolutionize scientific research in
- much the same way that the computer itself has.
-
- There are probably those who will question our enthusiasm and observe
- that our scientific forebears drew marvelous insight from very simple
- physical models of complex structure without the benefit of computers
- or graphics. It is said that the structure of benzene came to Kekule
- in a dream. Certainly, Watson and Crick were able to visualize amazing
- structure without the benefit of complex "tools"... which is exactly
- the point. When the visualization systems of the future are as easy to
- use as a box of snap together molecular models, as interactive as the
- microscope or as "free associative" as a dream - only then will they
- realize their full potential. These advances, however, will come at
- great computational cost.
-
- Where are the computational boundaries for VR? To address these
- issues we must first establish complexity bounds for VR in terms of
- computation (rendering, dynamics, constraints, etc) and I/O. The
- processing requirements of VR have been studied in terms of system
- dynamics and constraint satisfaction by [Pentland90] giving O(n**3)
- "calculations" per vertex for the dynamical system. For a 1000 objects
- with a 1000 vertices a 100 TFLOP performance is required to achieve
- interactivity (Assuming 100 floating point operations per system
- "calculation"). That astounding number is still two orders removed
- from the NEXT generation of supercomputers. The authors propose a
- reduced complexity model - still with computational complexity in the
- 10 GFLOP range to satisfy system constraints and 100MFLOP for the
- dynamics. A system which implements this approach is described in
- [PentE90]. [Witkin90] also discusses the constrained dynamical system
- in some detail. Polygon rendering has been discussed in [Ake88] with
- floating point requirements in the range of 40MFLOP for 100,000
- polygons. Further research needs to be done to reduce world complexity
- and to make higher resolution worlds with complex objects more
- tractable on near term computers.
-
- Several posters to this group have suggested that VR input processing
- requirements are quite modest - at least in terms of the data glove.
- We might ask how does the complexity scale as more and higher
- "resolution" input devices are introduced? Devices such as the Eye
- tracker [Levoy90] and 3D head tracker [Wang90] would seem to add
- significant complexity to VR processing requirements even with
- dedicated interface hardware. Comparatively less research has been
- done on the physical "output" side of VR. [Minsky90] describes a
- system using the sense of touch. What other input or output devices
- can we look forward to and what are the performance specs?
-
- Clearly, the exponential growth of the computational and I/O
- requirements of VR will motivate both algorithmic and architectural
- solutions. A recent estimate by the US Government projects "terraflop"
- commercial Supercomputers before the year 2000 [USG91] with the first
- demonstration systems emerging from the DARPA High Performance
- Computing Systems (HPCS) initiative in 1992-3. Whether the spectacular
- performance of these machines can be fully harnessed for VR remains to
- be seen. Perhaps what is really needed is a combination of VR specific
- architectures with VR specific algorithms - architectures which are on
- the HPCS technology learning curve (i.e. that employ MCM packaging,
- superdense ULSI, optical inteconnect, etc) COMBINED with algorithms
- which can replace O(n**3) with say O(n log n). A number of research
- groups have proposed (and in some cases actually built) "application
- specific" or "algorithm specific" visualization computers, including
- the well known UNC PxPL5 [Fuchs82], the SUNY CUBE [Kaufman88] and the
- Stanford SLAM [Dem86] systems. (We are not sure if a full version of
- the later machine was ever built.) In general, these researchers were
- motivated by the desire to explore "future" visualization algorithms.
-
- Our original motivation in developing the Princeton Engine was the
- desire to explore "future" video systems. However, we have found that
- its applicability is in no way "limited" to video and it can serve as
- a useful architecture to study visualization algorithms and future
- visualization systmes. We certainly do believe we can accomplish much
- of what we described in our previous posts. After all, the system can
- turn over 30 (16bit) GIGAOPS, or over 1 GFLOP (for you scientific
- types). More important, ALL system I/O is continuous and transparent
- to the CPU. A 48bitx28MHZ (=1.4GBPS) digital input bus and a
- 64bitx28MHZ (= 1.8GBPS) digital output bus can drive any combination
- of analog or digital I/O devices. Transparent, continuous gigabit I/O
- should be important to the NEXT generation of VR peripherals: Data
- Gloves, eye trackers, you invent it. Finally, while there is no
- special system requirement that either the I/O or the application be
- "video", a typical application will often COMBINE scientific computing
- AND real-time data visualization.
-
- Reat-time Interactive Volume Visualization
- ------------------------------------------
- To calibrate this machine for graphics applications, we are in the
- process of implementing a real-time volume rendering system that can
- arbitrarily rotate and render 256x256x256 volumes at 30fps. We believe
- we can volume render 1Kx1Kx1K at about 8fps using single axis
- rotation. This is possible because the Princeton Engine can perform a
- "continuous" real-time transpose (512x512x32bitsx30fps) for very
- little CPU cost. The programmer effectively has an array and its
- transpose as working data structures. At any line "time" each
- processor has a row and colume of the current frame in hand.
- Therefore, "scanline" algorithms are relatively straightforward...
-
- A number of recent papers suggest that Volume Visualization (VV) and
- Virtual Reality are closely related, convergent applications. The
- "edvol" system combines a VPL Data Glove and 3SPACE Polhemus Tracker
- to provide direct interaction with volumetric data [Kaufman90]. The
- authors do not characterize the size and complexity of either the VR
- or the VV but do describe it as "small scale". It often seems as
- though the electronic "media" believe that VV is "already" a standard
- component of VR. This obvious misconception has occured because volume
- visualization demos are usually presented as real-time interactive
- simulations, when the visualization actually took CPU hours to
- orchestrate. But the "dream" clearly is real-time interactive volume
- rendering and visualization. One of the best demonstrations I have
- seen of the potential for a VV fly through was presented by Mark Levoy
- (then of UNC) using volume rendered CT. In [Levoy89] the intended use
- of a head mounted display interface to the PxPL5 is described while in
- [Levoy90] the use of eye tracking hardware is described - in both
- cases specifically for VV. UNC's Steve Pizer showed a video of 8fps
- single axis rotation volume rendering on PxPL5 at the San Diego
- Workshop on VV. A head display system has also been developed at UNC
- to assist radiologists in treatment planning. Although these examples
- may not in all cases qualify as pure "VR" they certainly speak to the
- potential for a real-time interactive VR interface to a volume
- visualization environment.
-
- Volumetric data sets come in two basic forms: "real" sampled data
- (as in CT, MRI, Ultrasound, Optical or Xray Microscopy, etc.) and
- computed or "synthetic" data (weather models, CFD, etc.). In the later
- case the volume is usually "simulated" while in the former the "raw"
- data is sampled and sometimes preprocessed before the volume is
- rendered. For example, before an MRI image is produced the "raw"
- sampled data must be Fourier Transformed. With either approach the
- resulting data set is a 3D spatial volume. In the case of synthetic
- simulated data there is also a "timestep" - the fluid flows, the
- turbine spins, etc. With sampled data there is often no clear notion
- of time, the data is entirely static; however, the interaction with
- the data can be dynamic and even involve the "introduction" of time. A
- traveler passing through a sampled and rendered volume certainly
- experiences the passage of time, however, the "world" itself remains
- static. By analogy one can imagine walking through a museum (static
- sampled "objects") verses walking along the bank of a river (dynamic
- simulated "objects"). In our conceptual museum, as we begin to
- interact with objects we can simplify the system constraint dynamics
- as much as desired, literally "determining" the laws of physics. That
- Ming Dynasty vase I knocked over? It never touched the ground. If we
- are "in" an MRI or CT museum we might wish to change opacity, point of
- view or other parameters which effect our visual perception of the
- phenomena we are studying. Of course, the same control of time is
- possible in the "synthetic" case, however, only at the risk of
- undermining the scientific interpretation of the simulation i.e.
- correctly visualizing and understanding the physics is often
- fundamental to the experiment.
-
- With the emergence of increasingly real-time instrumention there is
- a second form of sampled data to consider: 3D spatial volumes with
- time varying data. Imagine a sampled volume of a living organism or
- dynamic micro structure which is updated 30 times a second. If we were
- "inside" this museum while it was "open" we could watch cells as they
- proceed through nuclear envelope breakdown, divide and emerge as two
- identical cells. (We are working with this kind of data now.) The
- degree to which this form of interaction is a Virtual Reality results
- not from our ability to "alter the experiment" on the fly, but from
- our ability to control the dynamics of how we "view" the experiment
- while it is taking place. We may ultimately be able to turn up the
- heat or add some catalyst to a chemical reaction from within the
- Virtual experience but that ability neither defines VR nor does the
- absense of that ability preclude VR. IMO, it is the VR observers
- perceived sense of simulated presence combined with the ability to
- control the visual experience which principally defines the
- interaction as VR.
-
- Two related projects provide sources of volumetric data which we are
- using at Sarnoff and which we feel have VR "prospects": from an
- experimental ultrasound instrument and from a differential inferential
- contrast (DIC) micrograph which produces a sequence of image slices
- through a cell embryo. The DIC volume can be acquired at or near
- real-time with the latest instrumentation - hence, a "video volume"
- (sorry Chris, it really can be...) In the case of the ultrasound
- instrument the Princeton Engine will also perform the front-end signal
- processing required to produce a volume from the sampled data.
- Presently, a "raw" 3D ultrasound data set cannot be acquired in
- real-time, however, the signal processing to produce a data volume
- from an unprocessed data set can potentially be accomplished in
- real-time, as can the Volume Rendering process. The KEY POINT is that
- we are going to see more and more real-time instrumentation which can
- produce true sampled data volumes. As the aquisition time of systems
- such as MRI, ultrasound, Electron Microscopy and DIC decrease we will
- also see greater coupling between the front-end signal processing, the
- data visualization and perhaps even the user interaction. At a recent
- workshop at Princeton University, "Seeing Into Materials: Imaging
- Complex Structures", both optical and EM microscopy systems capable of
- real-time 3D acquisition were described... Actually "4D" is used to
- refer to a 3D spatial volume "moving" in time. BTW, these scientists
- have a strong intuitive feel for the potential of VR - at least what
- they call "VR". That is, the ability to interact with an experiment -
- either in situ OR as part of post analysis - as in our museum
- examples.
-
- Is it fair to ask where in the taxonomy of VR systems we should place
- these kinds of applications? True, the worlds are derived from various
- real world spectra but the interactions are entirely SIMULATED, one
- can change viewing parameters, etc. The exact meaning of virtually
- touching "objects" or surfaces in such worlds remains unclear... but
- really no more so than in a CGI simulated world where everything is
- built from models. In either case the eventual consequence of our
- fundamental interactions within a world must be determined by a "law
- giver". If I put my data gloved hand directly into the burner of a VR
- Kitchen stove what happens?
-
- It should be noted that there are several potential problems with
- these methods of data visualization. First, a number of people report
- varying degrees of motion sickness when observing through a head
- mounted display. That may be acceptable if I am performing a "hammer-
- head" maneuver in my Superdecathalon simulator, but probably is not
- acceptable if I am inside someones brain. (Informal Poll: How Many of
- you have experienced this effect? RSVP and I will tabulate.) A second
- potential problem results from "persistance" effects on the human
- visual system. We vividly recall in the early days of VLSI CAD
- workstations the problem IC draft persons experienced after long hours
- staring at color stripes and squares. [Frome83] describes the so
- called "McCollough effect", wherein, after looking at color stripes
- for only a short time, high contrast B&W stripes suddenly appear to
- have color where none is present. To dramatize this effect during her
- talk at DAC83, Francine Frome periodically displayed slides with green
- and red stripes. About halfway through the talk she displayed a slide
- with a striped "BTL" in bright green foreground offset from a bright
- red background - before informing the audience that the slide was
- totally black and white! It was quite remarkable. These issues and
- other "human factor" issues will need to be fully understood before
- head mounted displays achieve broad use and certainly before we let
- Nintendo sell one to every ten year old...
-
- More VR/Video
- -------------
- In our original post we also speculated that multiple camaras might
- be used to develop a "Video" data glove or "whole body" interface to a
- virtual world. In particular, we asked how such an interface would
- impact the future design of the data glove. We received a number of
- thoughtful comments following this post. It is important to note that
- the VR world itself COULD STILL be CGI - with video only providing a
- framework for interaction. With support for up to six camaras one
- could surround participants either individually or collectively with
- video. (Remember we pay no CPU cost to load frames into memory from
- each camara, however, we do pay once we start to do something with the
- data.) Participants might wear "chroma keyed" gloves (wireless gloves
- of a reserved key color) or even body suites. Chroma keying is a well
- known technique for creating simple special effects such as the
- "weatherman" overlay [Ennes77] [Watk90]. We would NOT use this merely
- for special effects, however, but to provide a means of isolating the
- hands so we can build a useful model. On the Engine the ammount of
- processing for each chroma key is only about 5-10% of the "real-time
- budget" at 30fps. A second chroma key is used for the background. This
- is similar to Myron Kruegers Videoplace which uses white backing
- screens. It differs in that Videoplace produces only silhouettes of
- "Artificial Reality" participants as a group and provides a limited
- framework for identifying individual participants. ( Don't get me
- wrong - Videoplace is still a lot of Fun! - I recently spent a day at
- the Franklin Institute watching kids play in it and was impressed by
- the overall effect produced. )
-
- The next major technical step is to be able to exploit this interface
- in a useful way. In particular we want to study the effect of multiple
- "individual" participants. If two video channels are paired to each
- participant with a distinct chroma key can we construct a useful model
- and use that to interact with and control the dynamics of the
- visualization process. Our present plan will focus on the real-time
- recognition of simple hand gestures from each "pair of hands".
-
- The idea of using sign language as some posters have suggested is
- very interesting - particularly coupled with a neural net based
- recognizer. Recognizing full ASL in "continuous" real-time by any
- approach is probably ambitious, however, a useful subset might be
- possible. We have used a neural net approach to detect and remove
- characteristic AM impulse noise (aka "hair dryer noise") in a TV
- receiver [Pearson90]. One network is trained to detect AM impulses on
- an image line and a second network is trained to look at the entire
- image and determine which of the detected pulses are really "false"
- positives. This program runs in continuous real-time on the Princeton
- Engine. We also demonstrated real-time BEP training on a simple three
- layer MLP (a total of 86 w's and th's). For hand signs, however, a new
- network topology would be required - with the input to the network
- derived from the subsampled chroma key image segment of the original
- image. However, if my understanding of "conversational" ASL is
- correct - and each hand sign is typically an entire word or concept -
- then the resulting training set still might be hugh. Also, I believe
- that hand motion itself plays a significant role in the interpretation
- of signs - not just in the transition from one sign to another - as in
- cursive writing. This implies that a robust sign recognition system
- would need to compute a motion vector and use that as part of the
- training set. We would appreciate references to current work...
- particularly how one detects individual signs when in "continuous"
- conversation i.e. when does one sign end and the next begin?
-
- Lastly, a second video experiment would involve the use of the
- chroma key to present to each of three remote participants a composite
- image of their two neighbors, to form a virtual conference. While this
- interaction is entirely real-time we recognize that there will be a
- significant limit to the quality of interaction between subjects. We
- are interested in the degree of "total" immersion each person
- experiences. If we also mix the audio does the participant "feel"
- like he or she is having a conversation with three people.
- Unfortunately, I would imagine that the head mounted displays would
- tend to undermine intimacy - "perhaps" we could image warp new faces
- on everybody - just kidding Chris - although now that I think about
- it...
-
- References
- ----------
-
- [Pentland90] "Computational Complexity Verses Virtual Worlds", A
- Pentland, 1990 Symposium on Interactive 3D Graphics. Vol 24,
- No 2, March 1990, ACM SIGGRAPH
-
- ( Based on the quality of papers in the proceedings, this must have
- been a great conference! )
-
- [Witkin90] "Interactive Dynamics", A Witkin, M Gleicher, W Welch,
- 1990 Symposium on Interactive 3D Graphics. Vol 24, No 2, March
- 1990, ACM SIGGRAPH
-
- [PentE90] "The ThingWorld Modeling System: Virtual Sculpting by Modal
- Forces" A Pentland, I Essa, M Freidmann, B Horowitz.
- 1990 Symposium on Interactive 3D Graphics. Vol 24, No 2, March
- 1990, ACM SIGGRAPH
-
- [Ack88] "High Performance Polygon Rendering" K Akeley, T Jermoluk,
- Computer Graphics Vol 22, No 4, August 1988. ACM
-
- [Levoy90] "Gaze Directed Volume Visualization", M Levoy, W Whitaker.
- 1990 Symposium on Interactive 3D Graphics. Vol 24, No 2, March
- 1990, ACM SIGGRAPH
-
- [Wang90] "A Real-time Optical 3D Tracker For Head Mounted Display
- Systems" J Wang, V Chi, H Fuchs. 1990 Symposium on Interactive
- 3D Graphics. Vol 24, No 2, March 1990, ACM SIGGRAPH
-
- [Minsky90,vr-66] "Feeling and Seeing: Issues in Force Display" M
- Minsky, O Ming, O Steele, F Brooks, Jr. 1990 Symposium on
- Interactive 3D Graphics. Vol 24, No 2, March 1990, ACM
- SIGGRAPH
-
- [USG91] "Grand Challenges: High Performance Computing and
- Communications" A Report by the Committee on Physical,
- Mathematical and Engineering Sciences; Federal Coordinating
- Council for Science, Engineering, and Technology; Office of
- Science and Technology Policy.
-
- [Fuchs82] "Developing Pixel-Planes, A Smart Memory Based Raster
- Graphics System", H Fuchs, J Poulton, A Paeth, A Bell. 1982
- Conference on Advanced Research in VLSI.
-
- [Kauf88] "Memory and Processing Architecture for 3D Voxel-Based
- Imagery" A Kaufman, R Bakalash, IEEE Computer Graphics and
- Applications, Volume 8. No 11, November 1988, pg 10-23
- reprinted in "Volume Visualization", edited by A Kaufman, IEEE
- Computer Society, 1991.
-
- [Dem86] "Scan Line Access Memories for High Speed Image
- Rasterization", S.G. Demetrescu. Phd Dissertation, Stanford
- University. June 1986.
-
- [Kauf90] "Direct Interaction with a 3D Volumetric Environment", A
- Kaufman, R Yagel, R Bakalash, 1990 Symposium on Interactive 3D
- Graphics. Vol 24, No 2, March 1990, ACM SIGGRAPH
-
- (We also highly recommend, "Volume Visualization", edited by A
- Kaufman, IEEE Computer Society, 1991 which contains a large survey of
- relevant publications.)
-
- [Levoy89] "Design for a Real-Time High Quality Volume Rendering
- Workstation" Chapel Hill Workshop on Volume Visualization,
- 1989, Department of Computer Science, University of North
- Carolina. C. Upson, Editor.
-
- [Frome83] "Incorporating the Human Factor in Color CAD Systems", F.S.
- Frome, 20th Design Automation Conference, June 1983, IEEE
- Computer Society.
-
- [Ennes77] "Television Broadcasting: Equipement, Systems and Operating
- Fundamentals", H.W Sams, 1979. pg 319-323
-
- [Watk90] "The Art of Digital Video", John Watkinson, Focal Press,
- 1990, pg 75-77
-
- [Pearson90] "Artificial Neural Networks as TV Signal Processors"
- Clay D. Spence, John C. Pearson, Ronald Sverdlove SPIE
- Proceedings Vol. 1469: Applications of Artificial Neural
- networks, 1991
-
-
-
-
-
-